626 research outputs found

    How To Pick The Best Regression Equation: A Review And Comparison Of Model Selection Algorithms

    Get PDF
    This paper reviews and compares twenty-one different model selection algorithms (MSAs) representing a diversity of approaches, including (i) information criteria such as AIC and SIC; (ii) selection of a “portfolio” or best subset of models; (iii) general-to-specific algorithms, (iv) forward-stepwise regression approaches; (v) Bayesian Model Averaging; and (vi) inclusion of all variables. We use coefficient unconditional mean-squared error (UMSE) as the basis for our measure of MSA performance. Our main goal is to identify the factors that determine MSA performance. Towards this end, we conduct Monte Carlo experiments across a variety of data environments. Our experiments show that MSAs differ substantially with respect to their performance on relevant and irrelevant variables. We relate this to their associated penalty functions, and a bias-variance tradeoff in coefficient estimates. It follows that no MSA will dominate under all conditions. However, when we restrict our analysis to conditions where automatic variable selection is likely to be of greatest value, we find that two general-to-specific MSAs, Autometrics, do as well or better than all others in over 90% of the experiments.Model selection algorithms; Information Criteria; General-to-Specific modeling; Bayesian Model Averaging; Portfolio Models; AIC; SIC; AICc; SICc; Monte Carlo Analysis; Autometrics

    Evaluating Automatic Model Selection

    Full text link

    High yielding biomass ideotypes of willow (Salix spp.) show differences in below ground biomass allocation.

    Get PDF
    Willows (Salix spp.) grown as short rotation coppice (SRC) are viewed as a sustainable source of biomass with a positive greenhouse gas (GHG) balance due to their potential to fix and accumulate carbon (C) below ground. However, exploiting this potential has been limited by the paucity of data available on below ground biomass allocation and the extent to which it varies between genotypes. Furthermore, it is likely that allocation can be altered considerably by environment. To investigate the role of genotype and environment on allocation, four willow genotypes were grown at two replicated field sites in southeast England and west Wales, UK. Above and below ground biomass was intensively measured over two two-year rotations. Significant genotypic differences in biomass allocation were identified, with below ground allocation differing by up to 10% between genotypes. Importantly, the genotype with the highest below ground biomass also had the highest above ground yield. Furthermore, leaf area was found to be a good predictor of below ground biomass. Growth environment significantly impacted allocation; the willow genotypes grown in west Wales had up to 94% more biomass below ground by the end of the second rotation. A single investigation into fine roots showed the same pattern with double the volume of fine roots present. This greater below ground allocation may be attributed primarily to higher wind speeds, plus differences in humidity and soil characteristics. These results demonstrate that the capacity exists to breed plants with both high yields and high potential for C accumulation

    Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context

    Get PDF
    Long noncoding RNAs (lncRNAs) are commonly dys-regulated in tumors, but only a handful are known toplay pathophysiological roles in cancer. We inferredlncRNAs that dysregulate cancer pathways, onco-genes, and tumor suppressors (cancer genes) bymodeling their effects on the activity of transcriptionfactors, RNA-binding proteins, and microRNAs in5,185 TCGA tumors and 1,019 ENCODE assays.Our predictions included hundreds of candidateonco- and tumor-suppressor lncRNAs (cancerlncRNAs) whose somatic alterations account for thedysregulation of dozens of cancer genes and path-ways in each of 14 tumor contexts. To demonstrateproof of concept, we showed that perturbations tar-geting OIP5-AS1 (an inferred tumor suppressor) andTUG1 and WT1-AS (inferred onco-lncRNAs) dysre-gulated cancer genes and altered proliferation ofbreast and gynecologic cancer cells. Our analysis in-dicates that, although most lncRNAs are dysregu-lated in a tumor-specific manner, some, includingOIP5-AS1, TUG1, NEAT1, MEG3, and TSIX, synergis-tically dysregulate cancer pathways in multiple tumorcontexts

    Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas

    Get PDF
    This integrated, multiplatform PanCancer Atlas study co-mapped and identified distinguishing molecular features of squamous cell carcinomas (SCCs) from five sites associated with smokin

    Pan-cancer Alterations of the MYC Oncogene and Its Proximal Network across the Cancer Genome Atlas

    Get PDF
    Although theMYConcogene has been implicated incancer, a systematic assessment of alterations ofMYC, related transcription factors, and co-regulatoryproteins, forming the proximal MYC network (PMN),across human cancers is lacking. Using computa-tional approaches, we define genomic and proteo-mic features associated with MYC and the PMNacross the 33 cancers of The Cancer Genome Atlas.Pan-cancer, 28% of all samples had at least one ofthe MYC paralogs amplified. In contrast, the MYCantagonists MGA and MNT were the most frequentlymutated or deleted members, proposing a roleas tumor suppressors.MYCalterations were mutu-ally exclusive withPIK3CA,PTEN,APC,orBRAFalterations, suggesting that MYC is a distinct onco-genic driver. Expression analysis revealed MYC-associated pathways in tumor subtypes, such asimmune response and growth factor signaling; chro-matin, translation, and DNA replication/repair wereconserved pan-cancer. This analysis reveals insightsinto MYC biology and is a reference for biomarkersand therapeutics for cancers with alterations ofMYC or the PMN

    Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images

    Get PDF
    Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL maps are derived through computational staining using a convolutional neural network trained to classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and correlation with overall survival. TIL map structural patterns were grouped using standard histopathological parameters. These patterns are enriched in particular T cell subpopulations derived from molecular measures. TIL densities and spatial structure were differentially enriched among tumor types, immune subtypes, and tumor molecular subtypes, implying that spatial infiltrate state could reflect particular tumor cell aberration states. Obtaining spatial lymphocytic patterns linked to the rich genomic characterization of TCGA samples demonstrates one use for the TCGA image archives with insights into the tumor-immune microenvironment

    Forecasting: theory and practice

    Get PDF
    Forecasting has always been in the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The lack of a free-lunch theorem implies the need for a diverse set of forecasting methods to tackle an array of applications. This unique article provides a non-systematic review of the theory and the practice of forecasting. We offer a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts, including operations, economics, finance, energy, environment, and social good. We do not claim that this review is an exhaustive list of methods and applications. The list was compiled based on the expertise and interests of the authors. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of the forecasting theory and practice

    A Glycemia Risk Index (GRI) of Hypoglycemia and Hyperglycemia for Continuous Glucose Monitoring Validated by Clinician Ratings

    Get PDF
    BackgroundA composite metric for the quality of glycemia from continuous glucose monitor (CGM) tracings could be useful for assisting with basic clinical interpretation of CGM data.MethodsWe assembled a data set of 14-day CGM tracings from 225 insulin-treated adults with diabetes. Using a balanced incomplete block design, 330 clinicians who were highly experienced with CGM analysis and interpretation ranked the CGM tracings from best to worst quality of glycemia. We used principal component analysis and multiple regressions to develop a model to predict the clinician ranking based on seven standard metrics in an Ambulatory Glucose Profile: very low-glucose and low-glucose hypoglycemia; very high-glucose and high-glucose hyperglycemia; time in range; mean glucose; and coefficient of variation.ResultsThe analysis showed that clinician rankings depend on two components, one related to hypoglycemia that gives more weight to very low-glucose than to low-glucose and the other related to hyperglycemia that likewise gives greater weight to very high-glucose than to high-glucose. These two components should be calculated and displayed separately, but they can also be combined into a single Glycemia Risk Index (GRI) that corresponds closely to the clinician rankings of the overall quality of glycemia (r = 0.95). The GRI can be displayed graphically on a GRI Grid with the hypoglycemia component on the horizontal axis and the hyperglycemia component on the vertical axis. Diagonal lines divide the graph into five zones (quintiles) corresponding to the best (0th to 20th percentile) to worst (81st to 100th percentile) overall quality of glycemia. The GRI Grid enables users to track sequential changes within an individual over time and compare groups of individuals.ConclusionThe GRI is a single-number summary of the quality of glycemia. Its hypoglycemia and hyperglycemia components provide actionable scores and a graphical display (the GRI Grid) that can be used by clinicians and researchers to determine the glycemic effects of prescribed and investigational treatments
    • 

    corecore